The first covid-19 GWAS to find a genome-wide significant effect defined a credible set with 22 SNPs at chromosome 3p21.31 (lead SNP rs11385942). First off, here are the COVID-associated SNPs we’re using.
## [1] "3_45805277_A_G" "3_45801750_G_A" "3_45807268_G_C"
## [4] "3_45801823_C_T" "3_45801947_G_T" "3_45859142_G_C"
## [7] "3_45867532_A_G" "3_45866624_A_T" "3_45818159_G_A"
## [10] "3_45858159_A_G" "3_45825948_A_G" "3_45847198_A_G"
## [13] "3_45820440_G_A" "3_45821460_T_C" "3_45859597_C_T"
## [16] "3_45823240_T_C" "3_45830416_G_A" "3_45867022_C_G"
## [19] "3_45848429_A_T" "3_45838989_T_C" "3_45848457_C_T"
## [22] "3_45834967_G_GA"
For comparing with eQTLs in the region, we will look at all SNPs within 200 kb of these SNPs, to determine whether any of these are credibly causal SNPs for the eQTL effect.
Let’s first look at the minimum p value for Covid credible set SNPs for any gene (within 100 kb) for each QTL dataset. We’ll do this first for eQTL catalogue datasets.
Next do the same thing for GTEx.
We can also look at the minimum p value across QTL datasets for each gene, for both COVID SNPs and other SNPs. First for eQTL catalogue.
And also for GTEx.
SLC6A20 is the only gene for which a COVID SNP is also the top QTL SNP across datasets. Other genes where the top COVID SNP is near the top SNP overall include LZTFL1 and CCR9.
We would like to know if any of the COVID SNPs are credibly causal for the eQTL. This is not a proper colocalisation analysis, but we can get some of the way there by checking whether any COVID SNP has a QTL p value within a couple orders of magnitude of the minimum QTL p value for the same gene.
We will only do this for genes where at least one COVID SNP has a QTL p value smaller than 1e-3. For this, let’s merge the eQTL catalogue and GTEx results in each plot.
The direction of the arrow indicates whether the SNP is associated with increased (upwards arrow) or decreased (downwards arrow) gene expression. Note that this for COVID SNPs, this also indicates the direction of effect on COVID risk, since the ALT allele is the effect allele. For non-COVID SNPs, we can’t know for sure the direction of effect on COVID risk. (Where the p values are close for the top SNP and top COVID SNP, it’s more likely that these are in LD, and so the arrow then reflects the direction of COVID risk. But we would have to check the LD, or check in the GWAS summary statistics to be sure.)
We are hoping to find cases where the p value of the top COVID SNP (blue) is similar to the p value of the top SNP overall (red), and in the same direction. In general it makes sense to ignore most QTL datasets where the top SNP has a p value worse than about 1e-3 or 1e-4, since these are just noise. I would only pay attention to datasets where there is a clear QTL effect.
Genes of interest based on this:
Other genes, such as FYCO1, LZTFL1, and CCR9, in general don’t show COVID SNPs as candidates for being causal.
Overall I don’t feel this is great support for any gene, since we don’t have a case where many (independent) datasets point to the same gene with a COVID SNP near the top in most cases.